6.2 Constraint
59
StartLayout 1st Row 1st Column right arrow 2nd Column upper C 3rd Column upper V 2nd Row 1st Column upper C 2nd Column p Subscript c c 3rd Column p Subscript c v 3rd Row 1st Column upper V 2nd Column p Subscript v c 3rd Column p Subscript v v EndLayout
→C
V
C pcc pcv
V pvc pvv
(6.15)
wherep Subscript c cpcc means the probability that a consonant is followed by another consonant,
and similarly for the other terms. The matrix is stochastic; that is, the rows must add
up to 1. If every column is identical, then there is no dependence on the preceding
symbol, and we revert to a random, or zeroth-order Markov, process. Suppose now
that observation reveals that the probability of C occurring after V preceded by C is
different from that of C occurring after V preceded by V, or even that the probability
of C occurring after VV preceded by C is different from that of C occurring after
VV preceded by V. These higher-order Markov processes can be recoded in strict
Markov form; thus, for the second-order process (dependency of the probabilities on
the two preceding symbols) “VVC” can be written as a transition from VV to VC,
and hence the matrix of transition probabilities becomes
StartLayout 1st Row 1st Column right arrow 2nd Column CC 3rd Column CV 4th Column VC 5th Column VV 2nd Row 1st Column CC 2nd Column p Subscript c c c 3rd Column p Subscript c c v 4th Column 0 5th Column 0 3rd Row 1st Column CV 2nd Column 0 3rd Column 0 4th Column p Subscript c v c 5th Column p Subscript c v v 4th Row 1st Column VC 2nd Column p Subscript v c c 3rd Column p Subscript v c v 4th Column 0 5th Column 0 5th Row 1st Column VV 2nd Column 0 3rd Column 0 4th Column p Subscript v v c 5th Column p Subscript v v v EndLayout
→CC CV VC VV
CC pccc pccv
0
0
CV
0
0
pcvc pcvv
VC pvcc pvcv
0
0
VV
0
0
pvvc pvvv
(6.16)
and so on for higher orders. Notice that some transitions necessarily have zero
probability. 12
The reader may object that one rarely composes text letter by letter, but rather
word by word. Clearly, there are strong constraints governing the succession of words
in a text. The frequencies of these successions can be obtained by counting word
occurrences in very long text and are then used to construct the transition matrix,
which is, of course, gigantic even for a first-order process. We remark that a book
ending with “…in the solid state is greatly aided by this new tool” is more likely to
begin with “Rocket motor design received a considerable boost when …” than one
ending “I became submerged in my thoughts which sparkled with a cold light”. 13
We note here that clearly one may attempt to model DNA or protein sequences
as Markov processes, as will be discussed in Part III. Markov chains as such will be
discussed more fully in Chap. 11.
The notion of constraint applies whenever a set “is smaller than it might be”. The
classic example is that of road traffic lights, which display various combinations of
red, amber, and green, each of which may be on or off. Although 2 cubed equals 823 = 8 combi-
nations are theoretically possible, in most countries only certain combinations are
used, typically only four out of the eight. Constraints are ubiquitous in the universe
12 See also Sect. 11.2.
13 Good (1969) has shown that ordinary language cannot be represented even by a Markov process
of infinite order. So-called hidden Markov models (HMM), discussed elsewhere in this book, offer
in principle a more powerful representational tool.